Add dataproxy client selector by katrogan · Pull Request #959 · flyteorg/flyte-sdk

katrogan · 2026-04-10T08:23:30Z

NOT TO BE MERGED UNTIL BE CHANGES ARE IN.

This PR adds a new dependency on the flyte remote client for dataproxy endpoints

Use a client conn cache for dataproxy operations, using 'operation' and 'resource' as cache keys
On cache misses call SelectCluster to get the respective cluster endpoint and initialize and cache a new client conn

Going forward all new dataproxy calls should use this pattern.

Today this PR updates the only uses of dataproxy in the sdk: UploadInputs and CreateUploadLocation

Testing

Verified that the CreateRun path (which calls UploadInputs now) works with a local cluster

flyte -vvv  --config .flyte/config-oss-local.yaml  run -p flytesnacks -d development examples/basics/hello.py main
...
╭────────────────────────────────────────── Remote Run ───────────────────────────────────────────╮
│ Created Run: rmj7wwft78l69stlj69j                                                               │
│ URL: http://localhost:30080/v2/domain/development/project/flytesnacks/runs/rmj7wwft78l69stlj69j │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯

ref 26-353

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

EngHabu

LGTM, @pingsutw @wild-endeavor should advise though

EngHabu · 2026-04-10T15:47:13Z

+    def cluster_service(self) -> ClusterService:
+        return self._cluster_service
+
+    async def get_dataproxy_for_resource(self, operation: int, resource: object) -> DataProxyService:


Should this operation be an enum instead?

EngHabu · 2026-04-10T15:48:21Z

+        # Build the SelectClusterRequest with the right oneof field
+        req = cluster_payload_pb2.SelectClusterRequest(operation=operation)
+        if hasattr(resource, "DESCRIPTOR"):
+            field_map = {
+                "OrgIdentifier": "org_id",
+                "ProjectIdentifier": "project_id",
+                "TaskIdentifier": "task_id",
+                "ActionIdentifier": "action_id",
+                "ActionAttemptIdentifier": "action_attempt_id",
+            }
+            field_name = field_map.get(type(resource).__name__)
+            if field_name:
+                getattr(req, field_name).CopyFrom(resource)


Is this the best way to create the one of 😬, @wild-endeavor @pingsutw ?

we can do something like

req = SelectClusterRequest(operation=operation) if hasattr(resource, "DESCRIPTOR"): oneof = req.DESCRIPTOR.oneofs_by_name["resource"] # replace with actual oneof name for field in oneof.fields: if field.message_type is resource.DESCRIPTOR: getattr(req, field.name).CopyFrom(resource) break

nice thanks. done

EngHabu · 2026-04-10T15:48:52Z

+                getattr(req, field_name).CopyFrom(resource)
+
+        resp = await self._cluster_service.select_cluster(req)
+        cluster_endpoint = resp.cluster_endpoint


Are you going to normalize this here? stripping/adding http/s or dns:/// ?

not necessary! create_session_config already calls normalize_rpc_endpoint

EngHabu · 2026-04-10T15:54:28Z

+            if field_name:
+                getattr(req, field_name).CopyFrom(resource)
+
+        resp = await self._cluster_service.select_cluster(req)


Let's make sure we are throwing informative errors here in the infamous "no healthy clusters" error.. we now have a good place to catch that and raise a good error to the user

…luster-conns-for-dataproxy-svc

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

…luster-conns-for-dataproxy-svc

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

kumare3 · 2026-04-16T13:31:21Z

        return self._trigger_service

+    @property
+    def cluster_service(self) -> ClusterService:


Let's not expose this? Do we need it anywhere at all

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

kumare3 · 2026-04-16T16:16:43Z

+        operation: cluster_payload_pb2.SelectClusterRequest.Operation,
+        project_id: identifier_pb2.ProjectIdentifier,
+    ) -> DataProxyService:
+        from flyte._logging import logger


this is not "co-routine" safe. This will lead to a race condition

done, thank you

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

kumare3 · 2026-04-16T17:27:13Z

+        )
+        return await client.upload_inputs(request)
+
+    async def _resolve(


you should be just using alru_cache on this right?

kumare3 · 2026-04-16T17:28:22Z

+        if existing is not None:
+            return await existing
+
+        loop = asyncio.get_running_loop()


why do we need this? Why do we need a future? If there is a client can we just not cache it? if you use alru_cache you can simply delete all of this code, just return the cache and let alru_cache handle all of this work for you

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

NOT TO BE MERGED UNTIL [BE](flyteorg/flyte#7184) CHANGES ARE IN. This PR adds a new dependency on the flyte remote client for dataproxy endpoints 1. Use a client conn cache for dataproxy operations, using 'operation' and 'resource' as cache keys 2. On cache misses call SelectCluster to get the respective cluster endpoint and initialize and cache a new client conn Going forward all new dataproxy calls should use this pattern. Today this PR updates the only uses of dataproxy in the sdk: UploadInputs and CreateUploadLocation --- _Testing_ Verified that the CreateRun path (which calls UploadInputs now) works with a local cluster ``` flyte -vvv --config .flyte/config-oss-local.yaml run -p flytesnacks -d development examples/basics/hello.py main ... ╭────────────────────────────────────────── Remote Run ───────────────────────────────────────────╮ │ Created Run: rmj7wwft78l69stlj69j │ │ URL: http://localhost:30080/v2/domain/development/project/flytesnacks/runs/rmj7wwft78l69stlj69j │ ╰─────────────────────────────────────────────────────────────────────────────────────────────────╯ ``` ref 26-353 --------- Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

katrogan added 3 commits April 10, 2026 10:17

Add dataproxy client selector

714bad8

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

imports

c7521a6

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

lint

1a49dc8

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

EngHabu reviewed Apr 10, 2026

View reviewed changes

EngHabu previously approved these changes Apr 10, 2026

View reviewed changes

katrogan added 2 commits April 13, 2026 08:40

Merge branch 'main' into katrina/eng26-386-sdk-create-and-cache-per-c…

a1fcbf0

…luster-conns-for-dataproxy-svc

review comments

69dd1fc

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

katrogan dismissed EngHabu’s stale review via 69dd1fc April 13, 2026 06:49

katrogan added 3 commits April 13, 2026 08:50

fmt

54a1a07

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

Merge branch 'main' into katrina/eng26-386-sdk-create-and-cache-per-c…

443b729

…luster-conns-for-dataproxy-svc

Review comments

7d10be8

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

kumare3 reviewed Apr 16, 2026

View reviewed changes

Review comments

426b9ba

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

kumare3 reviewed Apr 16, 2026

View reviewed changes

katrogan added 2 commits April 16, 2026 18:50

concurrent

00c6b3b

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

uv.locks

6effcfd

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

kumare3 reviewed Apr 16, 2026

View reviewed changes

alru

90db3cb

Signed-off-by: Katrina Rogan <katroganGH@gmail.com>

kumare3 approved these changes Apr 16, 2026

View reviewed changes

katrogan merged commit 44b18f2 into main Apr 16, 2026
33 checks passed

Conversation

katrogan commented Apr 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

EngHabu left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

katrogan commented Apr 10, 2026 •

edited

Loading